A. Garas, F. Schweitzer and S. Havlin: 
A k- shell decomposition method for weighted networks 

New J. Phys. 14 (2012) 083030 



A £>shell decomposition method for weighted networks 

Antonios Garas 1 , Frank Schweitzer 1 and Shlomo Havlin 2 

1 Chair of Systems Design, ETH Zurich, Weinbergstrasse 58 CH-8092 Zurich, Switzerland 
2 Minerva Center and Department of Physics, Bar-Ilan University, 52900 Ramat Gan, Israel 

Abstract 

We present a generalized method for calculating the k-shell structure of weighted net- 
works. The method takes into account both the weight and the degree of a network, in such 
a way that in the absence of weights we resume the shell structure obtained by the clas- 
sic k-shell decomposition. In the presence of weights, we show that the method is able to 
partition the network in a more refined way, without the need of any arbitrary threshold 
on the weight values. Furthermore, by simulating spreading processes using the susceptible- 
infectious-recovered model in four different weighted real- world networks, we show that the 
weighted k-shell decomposition method ranks the nodes more accurately, by placing nodes 
with higher spreading potential into shells closer to the core. In addition, we demonstrate 
our new method on a real economic network and show that the core calculated using the 
weighted k-shell method is more meaningful from an economic perspective when compared 
with the unweighted one. 



1 Introduction 

The continuously growing attention in complex network science resulted over the past years 
in novel ways of analysis for a great number of complex systems in various scientific fields [1- 
7]. The fundamental view of this interdisciplinary approach is that large complex systems can 
be described as complex networks (or graphs under the mathematics terminology) where the 
nodes (or vertices) represent the system's interacting elements and the links (or edges) represent 
their interactions. This unified view was used in the analysis of social [7H9IK biological |10l— Il3l| . 
phy siological 14], technological 15, 16|, climate 17-19|. economic 20-231]. and financial systems 



24|, [25||. In combination with the technological advances that made enormously detailed data 
available, we are now able to understand and model the evolution of dynamical processes, like 
epidemic outbreaks and information spreading 26-3p|. 



Even the earliest empirical works in this field made clear to researchers that the topology of a 
network affects its properties. For example, networks with broad degree distributions are more 
robust to random failures, but are fragile under intentional attacks 31- HH]- Nowadays, there is 



a growing body of literature trying to understand global properties of a network by focusing 
on properties of individual nodes, and their connectivity patterns (36|. Of course the role of 
individual nodes has a profound relation to the evolution of any dynamical process, and to the 
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evolution of the network itself. For example, very popular individuals in a social network (i.e. 
individuals with a large number of connections) usually attract more attention and increase even 
more their connectivity. While it is clear that such processes affect the evolution of the network 
topology, we can imagine that such individuals could assume key roles in the case of disease 
spreading etc. 

It is clear that questions like "Who are the most important nodes in the network?" are natural to 
ask. Such questions can be addressed using centrality measures, which are the most frequently 
used measures when it comes to quantitative network analysis. However, there is a variety of 
centrality measures aiming to address the question of node "importance". For example there is 
the degree centrality (or just the degree of a node, i.e. the number of its links), the eigenvector 
centrality 0|, the betweenness centrality (38|, the closeness centrality j^j, etc. In this work we 
focus on a centrality measure based on the notion of fc-cores which is a fundamental concept in 
Graph Theory |4p| when it comes to ranking; the centrality of nodes in a complex network. Such 



ranking was applied in many real networks 2l|, l4ll-l48l| allowing a thorough investigation of their 



structure, while highlighting the role of various topology-dependent processes. 

One major limitation of most centrality measures, including the fc-core decomposition method, 
is their design to work on unweighted graphs. However, in practice, real networks are weighted, 
and their weights describe important and well defined properties of the underlying systems. In a 
weighted network, nodes have (at least) two properties that can characterise them, their degree 
and their weight. However, since weights are properties of the network's links, the node's weight 
is calculated as the sum over all link weights passing through a particular node. These two 
properties, even though in some cases are correlated, are in general independent. As a result, 
nodes with high degree can have small weight (i.e. they have many connections with other 
nodes but the links of these connections have small weights), while there could also be nodes 
with small degree and high weight. Situations where the weights play important role, occur 
for example in economic or trade networks. In such networks the weights are related to some 
measured property (like trade flow, capital flow etc.), and in many cases one wishes to focus on 
nodes with high weights that are (usually) the most important players. Thus, in such systems 
the presence of nodes with high degree and relatively small weights may influence the results 
obtained by methods that are based only on the degree. In such cases two main approaches 
have been used, with both having their own drawbacks. Under the first approach one completely 
neglects the weights and performs the analysis on the unweighted network, but doing so one 
chooses to neglect an important property of the network. The second approach would be to 
consider only links with weights above some - (usually) arbitrary chosen - threshold value and 
filter out the rest. The drawback of this approach is the selection of a proper cut-off value, which 
may remove important high degree nodes with links of low weights (below the threshold) and as 
we will discuss later, this could have significant impact on the results. Additionally, by neglecting 



2/na 



A. Garas, F. Schweitzer and S. Havlin: 
A k- shell decomposition method for weighted networks 

New J. Phys. 14 (2012) 083030 




Figure 1: Illustration of the layered structure of a network, obtained using the fc-shell decompo- 
sition method. The nodes between the two outer rings include nodes of shell 1 (k s = 1), while 
the nodes between the two inner rings compose shell 2 (k s = 2). The nodes within the central 
ring constitute the core, in this case k s = 3. 

links below a threshold, the network becomes sparser with some nodes getting disconnected and 
not considered by the applied method afterwards. 

Here we aim to overcome these failures by introducing a generalized method for calculating the 
fc-shell structure of weighted networks. The paper is organized in the following way: first we 
discuss the standard fc-shell decomposition method, and right after we introduce our generalized 
version. Next, we apply both methods on real networks and we present their results. Subsequently 
we compare in more detail the performance of both methods in ranking nodes according to their 
importance when it comes to spreading processes, and at the end we summarize our conclusions. 



2 The unweighted fc-shell decomposition method 

The k-core/k-shell decomposition method partitions a network into sub-structures that are di- 
rectly linked to centrality |49||. This method assigns an integer index, k Sl to each node that is 
representative of the location of the node in the network, according to its connectivity patterns. 
Nodes with low/high values of k s are located to the periphery/center of the network. This way, 
the network is described by a layered structure (similar to the structure of an onion), revealing the 
full hierarchy of its nodes. The innermost nodes belong to the structure called core or "nucleus" 
of the network, while the remaining nodes are placed into more external layers (fc-shells). 
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A more detailed description of how a network is divided into this fc-shell structure is the following 
(see FigdJ. First one removes recursively from the network all nodes with degree k = 1, and 
we assign the integer value k s = 1 to them. This procedure is repeated iteratively until there 
are only nodes with degree k > 2 left in the network. Subsequently, one removes all nodes with 
degree k = 2 and assign to them the integer value k s — 2. Again, this procedure is repeated 
iteratively until there are only with nodes with degree k > 3 left in the network, and so on. This 
routine is applied until all nodes of the network have been assigned to one of the fc-shells. This 
is how the original fc-shell decomposition method works, which, as described above, does not 
consider at all the weights of the links; therefore, from now on we will call it unweighted /c-shell 
decomposition method (C/j^ shell) • 

3 The weighted £>shell decomposition method 

Here we propose a generalization of the fc-shell decomposition method, that we call weighted k- 
shell decomposition method (W/c-sheii)- This method applies the same pruning routine that was 
described earlier, but it is based on an alternative measure for the node degree. This measure 
considers both the degree of a node and the weights of its links, and we assign for each node a 
weighted degree, k' '. The weighted degree of a node i is defined as 

i 

a+/3 

(1) 

where ki is the degree of node i, and w j * s the sum over a U its link weights. In the present 
study we discuss only the case where a = /3 = 1, which treats the weight and the degree equally. 
The full exploration of the parameter space is outside our scope, and is left for future work. 
Therefore, for what follows k\ — ki ^ij • 

Using the above approach in the case of unweighted networks, where Wij = 1, the weighted 
degree is equivalent to the node degree {k f = fe), and we resume the same network partitioning 
as with the Uk-she\\ decomposition method. However, in order that a typical weighted link will 
be regarded as of unit weight before we calculate k' using Eq. Q]we perform the following steps. 
First, we normalize all the weights with their mean value (w), next we divide the resulting weights 
with their minimum value, and we discretize them by rounding to the closest integer; this way 
the minimum link weight is equal to one 0. 



1 We also tested the effect of the normalization by dividing with the minimum weight, and the results we 
obtained in terms of node positioning with or without the normalization were similar. 



k' 
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Figure 2: Average degree of all nodes in each shell, obtained using the VF^-sheii decomposition 
method. The shaded area highlights the full range of the degree values in each shell. The shells 
are ranked according to their distance from the core, and the error bars are showing the standard 
deviation. Insets: zoom to distances closer to the core for networks with large number of shells 



In Fig. Q] we illustrate schematically the layered structure obtained by applying the Uk- S he\\ 
decomposition method in a graph. In order to highlight the weaknesses of the unweighted method, 
let us suppose that the network is weighted. For simplicity we assume that all link weights are 
equal to one, except for the weight of the link between nodes A and £>, which is wab — 3. As 
illustrated in Fig. [TJ the node B is located at the periphery of the network, even though it is 
strongly connected to one of the core nodes. In real networks such a strong link (3 times the 
capacity of other links) means that this particular node is of more importance for the core, but 
this is not depicted in the layered structured calculated by the classical unweighted approach, 
since this node will be placed in the outermost shell (k s — 1). However, if we apply the H^-sheii 
decomposition method, then node B is assigned to k s — 2 that is one shell away from the core 
of the network, highlighting its actual importance. 



4 Application to real networks 

In order to compare between the results obtained from the t/fc-shell an d the VFfc-sheii decomposi- 
tion method, we used as case studies the following four real networks: 
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1. Corporate Ownership Network (CON). This is an economic network linking 206 different 
countries. It is constructed [21] using the 616000 direct or indirect subsidiaries of the 4000 
world corporations with the highest turnover, based on the 2007 version of the ORBIS 
database obtained from the Bureau van Dijk Electronic Publishing (BvDEP)@. The network 
is weighted, and it's weights represent the business ties among countries [21]. 

2. The collaboration network of scientist working in network science (SCIE). This network 
contains the Co-authorship relations of scientists working on network theory and exper- 
iment, as compiled by M. Newman |50l |. The network is weighted, and it's weights are 
assigned as described in (5l| . 

3. The neural network of the nematode C. Elegans (CEL). This network was compiled by D. 
Watts and S. Strogatz [52|| using the original experimental data by White et al [Hsl] . It is a 
weighted representation of the neural network of C. Elegans. 

4. The U.S. Air transportation network (AIR). This is a weighted network obtained by con- 
sidering, the 500 US airports with the largest amount of traffic from publicly available 
data |54 |. Nodes represent US airports and edges represent air travel connections among 
them. It reports the anonymized list of connected pairs of nodes and the weight associated 
to the edge, expressed in terms of number of available seats on the given connection on a 
yearly basis. 

In Table [1] we provide some detailed statistical properties of the above networks. For our anal- 
ysis, if not stated otherwise, when we talk about the network we refer to the largest connected 
component (LCC), and whenever we discuss network properties these are calculated from the 
LCC. 

In Table [2] we compare the network hierarchies obtained by applying the ZT^ shell an d the H^-sheii 

decomposition method. We observe that the Wfc_ s heii method yields a more refined partitioning 
(larger number of fc-shells) of the networks. This means that by applying this method we obtain 
more detailed information about the networks' internal structure, and is similar to using a high 
resolution microscope to observe small size structures of a larger system. 

Furthermore, for three out of the four studied networks the core obtained with the W^-sheii 
contains smaller number of nodes, while these nodes are almost entirely part of the core obtained 
by the f/fc_ s h e ii- This means that the weighted method in most cases is able to split further the 
cores obtained by the unweighted method and to identify which are the most central of the central 
nodes. 

In Fig. [2] we plot the degrees of the nodes according to the fc-shell they belong (expressed as 
the distance from the core of the network). The node ranking is obtained using the VFfc-sheii 



2 Bureau van Dijk Electronic Publishing (BvDEP) http://www.bvdep.com/ 
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Table 1: Statistical properties of the networks used in our analysis. Here is the number of 
nodes, Ne is the number of edges, (k) is the average degree of the network nodes, d the diameter, 
C the clustering coefficient jsj, and B the network's betweenness (3^, [HH] - If the original network 
is disconnected, we only consider it's largest connected component. 

Network N N N E (k) d C B 

CON 206 2886 28X) 4 038 94.6 

SCIE 379 914 4.82 17 0.43 952.9 

CEL 297 2345 15.8 5 0.18 215.4 

AIR 500 2980 11.92 7 0.35 496.7 



Table 2: Comparison of the network hierarchies obtained by the ZT^ shell an d Wfc-shell decompo- 
sition method. Here s u and s w is the total number of fc-shells, while and n^f is the total 
number of nodes in the cores obtained using the t/fc-shell and the H^-sheii respectively. Nc is the 
number of common nodes in both cores, Njjwi is the fraction of nodes that belong to the core 
obtained by the Uk- S he\\ that also belong to the core obtained by the WksheWi an d Nwu is the 
fraction of nodes of the core obtained by the W^-sheii that also belong to the core obtained by 
the UksheW- 
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method for all the four different networks described above. As shown in Fig. O the degree is 
highly (and non linearly) correlated with the position of the node in the fc-shell structure, but 
there are particular cases where the trend is not monotonous. This means that there are nodes 
with high degree that may not be as central to the network as one would expect; this is in line 
with our discussion for the example network of Fig. [TJ 

4.1 A detailed example: analysis of the core of an economic network 

Next we compare the core of the c7 fc _ shell and the Wk- S he\\ decomposition methods applied on the 
global Corporate Ownership Network (CON) studied in [21|. The CON connects 206 countries 
around the globe, using as links the ownership relations within large companies. If companies 
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Figure 3: Changes in the CON network structure when using different weight cut-off values w c . 
Panels A), B), and C) show the network snapshots around the central region for w c = 3, w c = 75, 
and w c = 150 respectively. The size of the nodes is proportional to their degree. D) Evolution of 
the core size as a function of w c (After Garas et al [211]). E) Fraction of nodes and links of the 
original network that remain for different w c values. 



listed in country A have subsidiary corporations in country B, there is a link connecting these 
two countries directed from country A to country B. The weight of the link, wab, equals the 
number of the subsidiary corporations in country B controlled by companies of country A. 

Using the C/^— shell decomposition method, as shown in Tableland Fig. [3l we identify a core of 
41 countries. However we expect that in the current state of the global economy a smaller set 
of countries are the major players (G8, G20, etc). In order to reduce the size of the core, and 
to highlight which are the potentially more important nodes of this network by using the classic 
fc-shell decomposition method, a cut-off value of w c = 100 was assumed in Garas et al [21]. It was 
shown that the remaining network after filtering the links with w c < 100 contains only 66 out of 
the original 206 nodes. However, a core formed by the following 12 countries: United States of 
America (US), United Kingdom (GB), France (FR), Germany (DE), Netherlands (NL), Japan 
(JP), Sweden (SE), Italy (IT), Switzerland (CH), Spain (ES), Belgium (BE), and Luxembourg 
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(LU) was identified. In Fig. [3] the evolution of the core and network size of the CON is shown, 
as a function of the weight cut-off value w c . 

Using the Wk- S he\\ decomposition method we obtain the layered structure of the network includ- 
ing all the 206 nodes, without using any arbitrary cut-off parameter. The core of the network 
obtained with this method consists of the following 11 counties: United States of America (US), 
United Kingdom (GB), France (FR), Germany (DE), Netherlands (NL), Japan (JP), Canada 
(CA), Italy (IT), Switzerland (CH), Spain (ES), and Belgium (BE). Comparing these two cores 
we find a striking similarity. The only two differences are the presence of Canada (CA) in the 
core calculated using our new weighted fc-shell approach while Sweden (SE) and Luxembourg 
(LU) have moved to the second innermost layer. These differences can be well understood, con- 
sidering that CA is a major economy, it is part of G7, and all the other six members of G7 are 
already part of the core. Furthermore, CA outperforms SE and LU in terms of population and 
other macroeconomic indicators, such as total import/exports and GDP. It is thus natural to 
conclude that the core obtained using the W^-sheii decomposition method is more meaningful 
from an economics perspective, since it groups together some of the largest (developed) global 
economies. 



5 Dynamics: Shell positioning and spreading potential 

Recently models like the Susceptible-Infectious- Recovered (SIR) model I56| have been used ex- 

-581, economic crisis 



tensively in network research in order to explore epidemic spreading 27 

spreading (2l| as well as information and rumor spreading 26, 0] in social processes. However, 



in such processes the topology of the network is not the only thing that matters; the position 
of the node where the spreading begins plays an important role as well. In the resent work of 
Kitsak et al [48] it was shown that the spreading power of a node cannot be predicted solely 
based on its degree. A better measure is its actual position in the network, as it is described by 
the /c-shell where it belongs. 

Using this perspective, it is reasonable to assume that a fc-shell partitioning method provides us 
with a more accurate node ranking for representing the nodes' spreading power. Additionally, 
since the individual nodes are grouped in fc-shells, it is reasonable to assume that every fc-shell 
should contain nodes with similar spreading power. In what follows we will use these assumptions 
to evaluate and compare the performance of the C/fc shell an d W/c-sheil decomposition methods. 

We modeled spreading process by applying the SIR model on all the networks described above. 
However, since we are interested in the weights of the network, we used a version of the SIR 
model which takes into account the weight of the links that mediate the spreading. This model 
was originally introduced to simulate the spreading of an economic crisis [21]; for this model the 
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Figure 4: Average infected fraction of a /c-shell versus the shell's distance from the core of the 
network. 

probability of infection is different for every link and is calculated by 

Pij oc m • Wij/wj, (2) 

where Wij is the weight of the link that connects the origin node i with the destination node 
j, and wj is the total weight (wj = J2i w ij) °f the destination node j. The factor m is a free 
amplification parameter that can determine for example the severity of a crisis, how infectious 
a virus is, the importance of a rumor etc. In what follows we will call this model Weighted SIR 
(W-SIR). 

The modeling procedure of the W-SIR is the following. Initially we assign all nodes to be suscep- 
tible (S) to an infection. Next, one node, z, is chosen and is assumed to be infected (I). This node 
will infect all its neighboring nodes with probability during the first time step. This causes 
all infected nodes to switch their status from S to I, while the node that initiated this process 
changes to the recovered state (R), and can no longer infect other nodes or become infected. 
At every consecutive time step the process is repeated, and all the infected nodes are trying to 
infect their susceptible (S) neighbors in the network. The process lasts until there are no infected 
nodes left in the network. 

For each individual node we performed 100 realizations of the W-SIR model, and we calculated 
the average infected fraction of the network for different values of m E [0, 10]. This fraction is 
used as score in order to rank the nodes according to their spreading potential. We restricted 
ourselves to values of m in this interval, as for much larger m values the role of individual nodes 
is no longer important, and an epidemic outbreak emerges no matter where the infection starts. 
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Figure 5: Average value of the spreading potential of nodes within a fc-shell over all shells, (a), 
versus m. 

Next, we partitioned the network using the t/fc_ s heii an d the Wfc_ s heii decomposition methods, 
and ranked the obtained fc-sheils according to their distance from the core. By calculating the 
average infected fraction that results from an epidemic starting separately from all nodes of every 
individual fc-shell, we estimated the shell's spreading potential. 

In Fig. [4] we study how the average infected fraction changes versus the distance of each fc-shell 
from the core of the network for both methods. We find that, in general, the central fc-shells 
obtained by the H^-sheii method are more able to initiate a severe outbreak in comparison to 
the central fc-shells obtained using the £4- s heii method. This result is robust for all networks 
used in this study, and for different values of the parameter m. The above finding means that the 
Wfc_ s heii decomposition method positions the nodes with the higher average spreading potential 
in shells closer to the core. 

Next, we tested how homogeneous are the obtained fc-shells with respect to the spreading po- 
tential of their containing nodes. In order to do so, we calculated the standard deviation, a, of a 
node's infected fraction (spreading potential) for every fc-shell for a given value of the parameter 
m. Next we calculated the average value over all the shells, (a), and we plot it versus m (Fig. [5]). 
We find that the average standard deviation of the spreading potential using W-SIR is always 
lower when we partition the network using the Wfc- S hell method, with respect to partitioning us- 
ing the f7fc_ s heii method. This means that the VKfc-sheii method gives more homogeneous k -shells, 
where all nodes in the shell have similar importance for the dynamical process in question. 

As a final step, and given that the H^-sheii method performs better in positioning the nodes 
according to their W-SIR spreading potential in weighted graphs, it is interesting to further 



A. Garas, F. Schweitzer and S. Havlin: 
A k- shell decomposition method for weighted networks 

New J. Phys. 14 (2012) 083030 



0.12 
0.1 
0.08 

A 

^0.06 
0.04 
0.02 


2 4 6 8 

m 

Figure 6: Comparison of (a) versus m for two different configurations of the CON. W/c- s heii - W- 
SIR is the original case (also shown in Fig. [5]) where the nodes' spreading potential is obtained by 
applying the W-SIR in the original network. Wfc-shell - (Sh)W-SIR is a case where we calculated 
the nodes' spreading potential by applying the W-SIR on the 10 realizations of the CON with 
shuffled weights. 

explore the role of the weights in this process. To do so, we created 10 realizations of the CON 
network with shuffled weights, and we performed 100 runs of the W-SIR model on every one 
of these 10 networks. Next, we calculated the average spreading potential per fc-shell using 
the infected fraction obtained by the implementation of W-SIR on the network with shuffled 
weights. As shown in Fig. [6l in the shuffled case the fc-shells are becoming significantly more 
inhomogeneous, and their (a) is always larger that the (a) obtained by the original, unshuffled 
network. This procedure highlights the role of the weights in the process, since in the case where 
the weights do not to play any role these two curves should collapse into one. 

6 Conclusion 

In summary, we presented a generalized fc-shell decomposition method (W/c-shell) that considers 
the link weights of networks, without applying any arbitrary cut-off threshold on their value. 
The method resumes the same shell structure obtained by the classic fc-shell decomposition in 
the absence of weights, but when weights are present, it is able to partition the network in 
a more refined way. In it's general formulation, our method allows us to vary the importance 
assigned to either the node weights or the node degree, by adjusting the exponents a and /3 of 
Eq. [TJ Whilst in the current work we did not fully explore the parameter space, we would like to 
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stress that this additional flexibility provides a more accurate ranking for various applications. 
Here, using a = f3 = 1 we showed that the partitioning obtained by the Wk- S he\\ method is 
particularly meaningful in terms of the spreading potential of the nodes. We demonstrated the 
weighted version of the SIR model in four different networks, and showed that nodes with higher 
spreading potential were positioned in the core or in shells closer to the core, better in comparison 
with the UksheW method. 
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